Search CORE

39 research outputs found

Generalized Max Pooling

Author: Murray Naila
Perronnin Florent
Publication venue
Publication date: 01/01/2014
Field of study

State-of-the-art patch-based image representations involve a pooling operation that aggregates statistics computed from local descriptors. Standard pooling operations include sum- and max-pooling. Sum-pooling lacks discriminability because the resulting representation is strongly influenced by frequent yet often uninformative descriptors, but only weakly influenced by rare yet potentially highly-informative ones. Max-pooling equalizes the influence of frequent and rare descriptors but is only applicable to representations that rely on count statistics, such as the bag-of-visual-words (BOV) and its soft- and sparse-coding extensions. We propose a novel pooling mechanism that achieves the same effect as max-pooling but is applicable beyond the BOV and especially to the state-of-the-art Fisher Vector -- hence the name Generalized Max Pooling (GMP). It involves equalizing the similarity between each patch and the pooled representation, which is shown to be equivalent to re-weighting the per-patch statistics. We show on five public image classification benchmarks that the proposed GMP can lead to significant performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern Recognition (2014

arXiv.org e-Print Archive

CiteSeerX

Crossref

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

Author: Cho Minsu
Kang Dahyun
Koniusz Piotr
Murray Naila
Publication venue
Publication date: 07/07/2023
Field of study

We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions through separate task heads. Our model is able to effectively learn to perform classification and segmentation in the absence of pixel-level labels during training, using only image-level labels. To do this it uses attention maps, created from tokens generated by the self-supervised ViT backbone, as pixel-level pseudo-labels. We also explore a practical setup with ``mixed" supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings, and in particular when little-to-no pixel-level labels are available.Comment: Accepted at CVPR 202

arXiv.org e-Print Archive

Low-level spatiochromatic grouping for saliency estimation

Author: Murray Naila
Otazu Porter Xavier
Parraga Carlos Alejandro
Vanrell i Martorell Maria Isabel
Publication venue
Publication date: 01/01/2013
Field of study

We propose a saliency model termed SIM (saliency by induction mechanisms), which is based on a low-level spatiochromatic model that has successfully predicted chromatic induction phenomena. In so doing, we hypothesize that the low-level visual mechanisms that enhance or suppress image detail are also responsible for making some image regions more salient. Moreover, SIM adds geometrical grouplets to enhance complex low-level features such as corners, and suppress relatively simpler features such as edges. Since our model has been fitted on psychophysical chromatic induction data, it is largely nonparametric. SIM outperforms state-of-the-art methods in predicting eye fixations on two datasets and using two metrics

Diposit Digital de Documents de la UAB

Saliency estimation using a non-parametric low-level vision model

Author: Murray Naila
Otazu Porter Xavier
Parraga Carlos Alejandro
Vanrell i Martorell Maria Isabel
Publication venue: Colorado Springs : IEEE,
Publication date: 01/01/2011
Field of study

Many successful models for predicting attention in a scene involve three main steps: convolution with a set of filters, a center-surround mechanism and spatial pooling to construct a saliency map. However, integrating spatial information and justifying the choice of various parameter values remain open problems. In this paper we show that an efficient model of color appearance in human vision, which contains a principled selection of parameters as well as an innate spatial pooling mechanism, can be generalized to obtain a saliency model that outperforms state-of-the-art models. Scale integration is achieved by an inverse wavelet transform over the set of scale-weighted center-surround responses. The scale-weighting function (termed ECSF) has been optimized to better replicate psychophysical data on color appearance, and the appropriate sizes of the center-surround inhibition windows have been determined by training a Gaussian Mixture Model on eye-fixation data, thus avoiding ad-hoc parameter selection. Additionally, we conclude that the extension of a color appearance model to saliency estimation adds to the evidence for a common low-level visual front-end for different visual tasks

Diposit Digital de Documents de la UAB

Revisiting the Fisher vector for fine-grained classification

Author: Gosselin Philippe-Henri
Jégou Hervé
Murray Naila
Perronnin Florent
Publication venue: 'Elsevier BV'
Publication date: 01/11/2014
Field of study

International audienceThis paper describes the joint submission of Inria and Xerox to their joint participation to the FGCOMP'2013 challenge. Although the proposed system follows most of the standard Fisher classification pipeline, we describe a few key features and good practices that significantly improve the accuracy when specifically considering fine-grain classification tasks. In particular, we consider the late fusion of two systems both based on Fisher vectors, but for which we choose drastically design choices that make them very complementary. Moreover, we propose a simple yet effective filtering strategy, which significantly boosts the performance for several class domains

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1